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Abstract 

The minimum spanning tree, based on the concept of ultrametricity, is constructed from 
the correlation matrix of stock returns and provides a meaningful economic taxonomy of 
the stock market. In order to study the dynamics of this asset tree we characterize it by 
its normalized length and by the mean occupation layer, as measured from an appropriately 
chosen center. We show how the tree evolves over time, and how it shrinks particularly 
strongly during a stock market crisis. We then demonstrate that the assets of the optimal 
Markowitz portfolio lie practically at all times on the outskirts of the tree. We also show 
that the normalized tree length and the investment diversification potential are very strongly 
correlated. 
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Portfolio optimization is one of the basic tools of hedging in a risky and extremely complex 
financial environment. Many attempts have been made to solve this central problem starting 
from the classical approach of Markowitz to more sophisticated treatments including spin 
glass type studies 0. In all of these attempts, correlations between asset prices play a 
crucial role. A closely related problem is that of economic taxonomy. In his recent paper 0, 
Mantegna suggested the study the clustering of companies by using the correlation matrix of 
asset returns such that a simple transformation of the correlations into distances produces a 
connected graph. In the graph the nodes are the companies and the 'distances' between them 
are obtained from the correlation coefficients and the clusters of companies are identified 
by means of the minimum spanning tree. It turned out that in this way the hierarchical 
structure of the financial market could be identified in accordance with the results obtained 
by an independent clustering method based on Potts super-paramagnetic transitions 0. In 
another paper by Bonanno et al. the time evolution of stock indices was studied and 
significant changes in the world economy were identified by using appropriate time horizons 
and the minimum spanning tree clustering method. The hierarchical structure explored by 
the minimum spanning tree also seemed to give information about the infiuential power of the 
companies. The network of infiuence was recently investigated by means of a time-dependent 
correlation method 0. Some other attempts have been made to understand the structure 
of correlation matrices in a highly random setting using the theory of random matrices [0]. 

In this paper, we study the minimum spanning tree determined from correlations between 
stock returns and call it an 'asset tree'. Although this asset tree can reveal a great deal about 
the taxonomy of the market at a given time, it only represents a snapshot of an evolving 
complex system. This evolution is a refiection of the changing power structure in the market 
and manifests the passing of different products and product generations, new technologies, 
management teams, alliances and partnerships, amongst many other things. This is why 
exploring the asset tree dynamics can provide us new insights to the market. Here, by 
studying the time evolution of the asset tree we show that although the structure of the tree 
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changes with time, the companies of the optimal Markowitz portfolio are always on its outer 
leaves. We also study the robustness of the tree topology and the consequences of the market 
events on its structure. The minimum spanning tree, as a strongly pruned representative of 
asset correlations is found to be robust and descriptive of stock market events. 

We start our analysis by assuming that there are assets with price -Pj(t) for asset i 
at time t. Then the logarithmic return of stock i is ri{t) = InPj(t) — lnPj(t — 1), which 
for a certain consecutive sequence of trading days forms the return vector r^. In order to 
characterize the synchronous time evolution of stocks, we use the equal time correlation 
coefficients between stocks i and j defined as 



where (...) indicates a time average over the trading days included in the return vectors. 
These correlation coefficients forming an x matrix with — 1 < Pij < 1, is then trans- 
formed to an X distance matrix with elements dij = a/2(1 — pij), such that 2 > rfy > 0, 
respectively. The dijs fulfill the requirements of distances, even those of ultrametricity Q. 
We now use the distance matrix to determine the minimum spanning tree (MST) of the 
distances, denoted by T, which is a simply connected graph that connects all the A^ nodes of 
the graph with A^ — 1 edges such that the sum of all edge weights, j)eT ^ij^ minimum. It 
should be noted that in constructing the minimum spanning tree, we are effectively reducing 
the information space from N{N — l)/2 separate correlation coefficients to A^ — 1 tree edges. 

The dataset we have used in this study consists of daily closure prices for 116 stocks of 
the S&P 500 index @, which were obtained from the Yahoo website The time period 
of this data extends from the beginning of 1982 to the end of 2000 including a total of 
4787 price quotes per stock, after the removal of a few days due to incomplete data. We 
divide this data into M windows t = 1, 2, M of width T corresponding to the number of 

3 



daily returns included in the window. Different windows overlap with each other, the extent 
of which is dictated by the window step length parameter 6T, describing the displacement 
between two consecutive windows, measured also by the number of trading days. The choice 
of the window width is a trade-off between too noisy and too smoothed data for small and 
large window widths, respectively. In our studies, T was set to be typically between 500 and 
1500 trading days, i.e., 2 and 6 years, and 6T to one month including about 21 trading days. 
This is in accordance with the suggestions of the Basel committee |I0| . 

In order to study the temporal state of the market we define the normalized tree length 

as 



where t denotes the time at which the tree is constructed, and — 1 is the number of edges 
present in the MST. To characterize the position of companies in the graph, i.e., the layers on 
which the different nodes are located at a given time, we introduce the concept of a central 
node. Although there is arbitrariness in the choice of the central node, we propose that it is 
central in the sense that any change in its price strongly affects the course of events in the 
market on the whole. Thus the central node would be the company which is most strongly 
connected to its nearest neighbors in the tree. With this choice the sum of the correlation 
coefficients calculated for the incident edges would be maximum, and/or have the highest 
vertex degree (the number of edges which are incident with the vertex). It is also noted that 
one can have either a static (fixed at all times) or a dynamic (continuously updated) central 
node, without considerable effects on the results. In our studies. General Electric (GE) was 
chosen as the central node, since for about 70% of the period considered it was the most 
connected node. A typical asset tree is shown in Figure 1, where it is evident that companies 
become clustered by business sectors. 
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Figures 2 (a) and (b) show how the normalized tree length L and the mean correlation 
coefficient, defined as p = jv(jv-i)/2 ^ P^i^ where we consider only the non-diagonal and 
independent pij, evolve with time. The two curves, indeed, look like mirror images, which 
is corroborated by the fact that the correlation coefficient is —0.96, indicating that the 
minimum spanning tree is a strongly reduced representative of the whole correlation matrix 
and bears the essential information about asset correlations. As further evidence that the 
MST retains the salient features of the stock market, it is noted that the 1987 market crash 
can be quite accurately seen in Figure 2. The two sides of the ridge actually converge as a 



result of extrapolating the window width T Q |[TT||. In Figure 2 (a), the mean correlation 
of stocks is very high during the crash. This is because the market forces act strongly on all 
the stocks and force the market to behave in a unified way. Figure 2 (b) also strengthens 
this fact: L{t) decreases indicating that the nodes on the graph are drawn closer together. 
In order to characterize the spread of nodes on the graph, we introduce the quantity of mean 
occupation layer as 



^W = ^EM^*), (3) 

i=l 

where lev(t>j) denotes the level of vertex Vi in relation to the central node, whose level is 
taken to be zero. We find that l{t) reaches a very low value at the time of a market crisis 
(see Figure 3). 

Next, we apply the above discussed concepts and measures to portfolio analysis. We 
consider a minimum risk Markowitz portfolio P{t) with the asset weights Wi, W2, • • • , Wn- 
In the Markowitz portfolio optimization scheme financial assets are characterized by their 
average return and risk, both determined from historical price data, where risk is measured 
by the standard deviation of returns. The aim is to optimize the asset weights so that the 
overall portfolio risk is minimized for a given portfolio return [|T^. In the minimum spanning 
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tree framework, the task is to determine how the assets are located with respect to the central 
node. Intuitively, we expect the weights to be distributed on the outskirts of the graph. In 
order to describe what happens, we define a single measure, the weighted portfolio layer as 



where we have the constraint Wi >0 for all i, since we assume that there is no short-selling. 

Figure 3 shows the behaviour of the mean layer l{t) and the weighted minimum risk 
portfolio layer I pit). We find that the portfolio layer is higher than the mean layer practically 
at all times. The difference in layers depends to a certain extent on the window width: for 
T — 500 it is about 0.76 and for T — 1000 about 0.97. As the stocks of the minimum risk 
portfolio are found on the outskirts of the graph, we expect larger graphs (higher L) to have 
greater diversification potential, i.e., the scope of the stock market to eliminate specific risk 
of the minimum risk portfolio. In order to look at this, we calculated the mean-variance 
frontiers for the ensemble of 116 stocks using T = 500 as the window width. In Figure 2 
(c), we plot the level of portfolio risk as a function of time, and find a striking similarity 
between the risk curve and the curves of the mean correlation coefficient p and normalized 
tree length L of Figures 2 (a) and (b). The correlation between the risk and p is 0.82, while 
the correlation between the risk and L is —0.90. Therefore, the latter result explains the 
diversification potential of the market better. 

Finally, in order to investigate the robustness of the minimum spanning tree topology, 
we define the survival ratio of tree edges (fraction of edges is found common in both graphs) 
at time t as 
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In this refers to the set of edges of the graph at time t, D is the intersection operator 
and |...| gives the number of elements in the set. Under normal circumstances, the graphs at 
two consecutive time windows t and t + 1 (for small values of 6T) should look very similar. 
Whereas some of the differences can reflect real changes in the asset taxonomy, others may 
simply be due to noise. We flnd that as 6T ^ 0, at ^ I indicating that the graphs are 
stable in the limit, and hence our portfolio analysis is justified. 

In summary, we have studied the dynamics of asset trees and applied it to portfolio 
analysis. We have shown that the tree evolves over time and have found that the normalized 
tree length decreases and remains low during a crash, thus implying the shrinking of the asset 
tree particularly strongly during a stock market crisis. We have also found that the mean 
occupation layer fluctuates as a function of time, and experiences a downfall at the time of 
market crisis due to topological changes in the asset tree. As for the portfolio analysis, it 
was found that the stocks included in the minimum risk portfolio tend to lie on the outskirts 
of the asset tree: on average the weighted portfolio layer is about 1 level higher, or further 
away from the central node, than mean occupation layer for window width of four trading 
years. The correlation between the risk and the mean correlation was found to be quite 
strong, though not as strong as the correlation between the risk and the normalized tree 
length. Thus it can be concluded that the diversiflcation potential of the market is very 
closely related to the behaviour of the normalized tree length. 
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Figure Captions 

Fig. 1 : A typical asset taxonomy (minimum spanning tree) graph connecting the examined 
116 stocks of the S&P 500 index. The graph was produced using four- year window width 
and it is centered on January 1, 1998. Business sectors are indicated according to Forbes, 
http://www.forbes.com. In this graph, General Electric (GE) was used as a a central node 
and eight layers can be identified. 

Fig. 2 : Plots of (a) the mean correlation coefficient p, (b) the normalized tree length L 
and (c) the risk of the minimum risk portfolio, as functions of time. The risk is determined 
with weight limits of zero lower bound (no short-selling) and unit upper bound (any asset 
may constitute the entire portfolio). For all plots the window width is T = 500, i.e., two 
trading years. 

Fig. 3 : Plots of mean occupation layer / and weighted portfolio layer Ip as functions of 
time. This plot is based on the window width T = 1000, i.e., four trading years. 
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